Language identification: insights from the classification of hand annotated phone transcripts

نویسندگان

  • Timothy Kempton
  • Roger K. Moore
چکیده

Language Identification (LID) of speech can be split into two processes; phone recognition and language modelling. This two stage approach underlies some of the most successful LID systems. As phone recognizers become more accurate it is useful to simulate a very accurate phone recognizer to determine the effect on the overall LID accuracy. This can be done by using phone transcripts. In this paper LID is performed on phone transcripts from six different languages in the OGI multi-language telephone speech corpus. By simulating a phone recognizer that classifies phones into ten broad classes, a simple n-gram model gives low LID equal error rates (EER) of <1% on 30 seconds of test data. Language models based on these accurate phone transcripts can reveal insights into the phonology of different languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mobile, L2 vocabulary learning, and fighting illiteracy: A case study of Iranian semi-illiterates beyond transition level

As mobile learning simultaneously employs both handheld computers and mobile telephones and other  devices  that  draw  on  the  same  set  of  functionalities,  it  throws  open  the  door  for  swift connection between learners  and teachers. This  study examined and articulated the impact of  the application of mobile devices for teaching English vocabulary items to 123 Iranian semi-illitera...

متن کامل

Incorporating Cognitive Linguistic Insights into Classrooms: the Case of Iranian Learners’ Acquisition of If-Clauses

Cognitive linguistics gives the most inclusive, consistent description of how language is organized, used and learned to date. Cognitive linguistics contains a great number of concepts that are useful to second language learners.  If-clauses in English, on the other hand, remain intriguing for foreign language learners to struggle with, due to their intrinsic intricacies. EFL grammar books are ...

متن کامل

The Role of Disfluencies in Topic Classification of Human-Human Conversations

We investigate the impact of disfluencies on the task of classifying natural human-human conversations into topics. Disfluencies are distinctive to spoken language, and their effect on a number of spoken language understanding tasks, including spoken language classification, remains largely unknown. We use a subset of Switchboard-I annotated for disfluencies and topics, and investigate the effe...

متن کامل

Creation of an Annotated German Broadcast Speech Database for Spoken Document Retrieval

In this paper we present a semi-automatic method for creating annotated data sets from German-language broadcast resources for which audio files as well as transcripts are available on the Internet. The transcripts are required to be reasonably accurate, but not perfect. Our approach is implemented by a integrated bundle of data processing tools, which support the human annotator in the creatio...

متن کامل

Core Units of Spoken Grammar in Global ELT Textbooks

Materials evaluation studies have constantly demonstrated that there is no one fixed procedure for conducting textbook evaluation studies. Instead, the criteria must be selected according to the needs and objectives of the context in which evaluation takes place. The speaking skill as part of the communicative competence has been emphasized as an important objective in language teaching. The pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008